Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

servstate: remove on-check-failure test races #272

Merged
merged 2 commits into from
Aug 2, 2023

Conversation

flotter
Copy link
Contributor

@flotter flotter commented Aug 2, 2023

This patch improves the following tests, all related to checker functionality in servstate observed when running under stress:

TestOnCheckFailureRestartWhileRunning
TestOnCheckFailureRestartDuringBackoff
TestOnCheckFailureIgnore
TestOnCheckFailureShutdown

Since we are unit testing servstate, and not checkstate, this patch adds code to more effectively control (mock) when the check failure actions are delivered during the test. This removes the requirement for tests to rely on finely tuned delays to check various service life-cycle states.

The following changes are made:

  • Round some of the delays to multiples of 100ms to highlight the fact these values are less important now. Delays intended to present infinite are replaced with 10s to ensure severe failures are observed if the test does not work as intended.

  • Remove iteration code to wait for a checker action, and replace with channel sync code.

  • Add some additional function comments to explain what the tests aim to confirm.

This patch improves the following tests, all related to checker
functionality in servstate observed when running under stress:

TestOnCheckFailureRestartWhileRunning
TestOnCheckFailureRestartDuringBackoff
TestOnCheckFailureIgnore
TestOnCheckFailureShutdown

Since we are unit testing servstate, and not checkstate, this patch
adds code to more effectively control (mock) when the check failure
actions are delivered during the test. This removes the requirement
for tests to rely on finely tuned delays to check various service
life-cycle states.

The following changes are made:

- Round some of the delays to multiples of 100ms to highlight the
  fact these values are less important now. Delays intended to
  present infinite are replaced with 10s to ensure severe failures
  are observed if the test does not work as intended.

- Remove iteration code to wait for a checker action, and replace
  with channel sync code.

- Add some additional function comments to explain what the tests
  aim to confirm.
internals/overlord/servstate/manager_test.go Outdated Show resolved Hide resolved
internals/overlord/servstate/manager_test.go Outdated Show resolved Hide resolved
@benhoyt
Copy link
Contributor

benhoyt commented Aug 2, 2023

Thanks for this! Definitely a nice incremental improvement. I updated the PR to fix the nits I mentioned so I can merge right away.

@benhoyt benhoyt merged commit f593810 into canonical:master Aug 2, 2023
13 checks passed
@barrettj12
Copy link
Contributor

@flotter synchronising on a channel seems like a good idea. Nonetheless, I am still observing intermittent failures in the following tests:

TestOnCheckFailureRestartWhileRunning
$ stress go test ./internals/overlord/servstate -check.f TestOnCheckFailureRestartWhileRunning -count=1 -race
2023-08-03T00:16:46.601Z [test] Service "test2" starting: /bin/sh -c 'echo x >>/tmp/check-3400122428/1/out; sync; touch /tmp/check-3400122428/0/test2; sleep 10'
2023-08-03T00:16:46.702Z [test] Check "chk1" failure 1 (threshold 1): exec: "will-fail": executable file not found in $PATH
2023-08-03T00:16:46.702Z [test] Check "chk1" failure threshold 1 hit, triggering action
2023-08-03T00:16:46.928Z [test] Service "test2" on-check-failure action is "restart", terminating process before restarting
2023-08-03T00:16:46.929Z [test] Service "test2" exited after check failure, restarting
2023-08-03T00:16:46.929Z [test] Service "test2" on-check-failure action is "restart", waiting ~100ms before restart (backoff 1)
2023-08-03T00:16:46.929Z [test] Check "chk1" failure 2 (threshold 1): exec: "will-fail": executable file not found in $PATH
2023-08-03T00:16:46.971Z [test] Service "test2" stopped while waiting for backoff

----------------------------------------------------------------------
FAIL: manager_test.go:984: S.TestOnCheckFailureRestartWhileRunning

manager_test.go:1044:
    c.Assert(checks[0].Failures, Equals, 1)
... obtained int = 2
... expected int = 1

OOPS: 0 passed, 1 FAILED
--- FAIL: Test (0.40s)
FAIL
FAIL    github.com/canonical/pebble/internals/overlord/servstate        0.458s
FAIL
TestOnCheckFailureIgnore
$ stress go test ./internals/overlord/servstate -check.f TestOnCheckFailureIgnore -count=1 -race
2023-08-03T00:18:28.213Z [test] Service "test2" starting: /bin/sh -c 'echo x >>/tmp/check-3858153059/1/out; sync; touch /tmp/check-3858153059/0/test2; sleep 10'
2023-08-03T00:18:28.313Z [test] Check "chk1" failure 1 (threshold 1): exec: "will-fail": executable file not found in $PATH
2023-08-03T00:18:28.313Z [test] Check "chk1" failure threshold 1 hit, triggering action
2023-08-03T00:18:28.444Z [test] Check "chk1" failure 2 (threshold 1): exec: "will-fail": executable file not found in $PATH
2023-08-03T00:18:28.513Z [test] Check "chk1" failure 3 (threshold 1): exec: "will-fail": executable file not found in $PATH
2023-08-03T00:18:28.544Z [test] Service "test2" stopped

----------------------------------------------------------------------
FAIL: manager_test.go:1167: S.TestOnCheckFailureIgnore

manager_test.go:1226:
    c.Assert(checks[0].Failures, Equals, 1)
... obtained int = 2
... expected int = 1

OOPS: 0 passed, 1 FAILED
--- FAIL: Test (0.34s)
FAIL
FAIL    github.com/canonical/pebble/internals/overlord/servstate        0.374s
FAIL
TestOnCheckFailureShutdown
$ stress go test ./internals/overlord/servstate -check.f TestOnCheckFailureShutdown -count=1 -race
2023-08-03T00:19:01.028Z [test] Service "test2" starting: /bin/sh -c 'echo x >>/tmp/check-1520139323/1/out; sync; touch /tmp/check-1520139323/0/test2; sleep 10'
2023-08-03T00:19:01.173Z [test] Check "chk1" failure 1 (threshold 1): exec: "will-fail": executable file not found in $PATH
2023-08-03T00:19:01.173Z [test] Check "chk1" failure threshold 1 hit, triggering action
2023-08-03T00:19:01.546Z [test] Service "test2" on-check-failure action is "shutdown", triggering server exit
2023-08-03T00:19:01.546Z [test] Check "chk1" failure 2 (threshold 1): exec: "will-fail": executable file not found in $PATH
2023-08-03T00:19:01.604Z [test] Service "test2" stopped

----------------------------------------------------------------------
FAIL: manager_test.go:1243: S.TestOnCheckFailureShutdown

manager_test.go:1302:
    c.Assert(checks[0].Failures, Equals, 1)
... obtained int = 2
... expected int = 1

OOPS: 0 passed, 1 FAILED
--- FAIL: Test (0.60s)
FAIL
FAIL    github.com/canonical/pebble/internals/overlord/servstate        0.628s
FAIL

@flotter
Copy link
Contributor Author

flotter commented Aug 3, 2023

@flotter synchronising on a channel seems like a good idea. Nonetheless, I am still observing intermittent failures in the following tests:

TestOnCheckFailureRestartWhileRunning
TestOnCheckFailureIgnore
TestOnCheckFailureShutdown

Sorry about this Jordan. I feel embarrassed, but will fix that asap.

@barrettj12
Copy link
Contributor

No need to be embarrassed, you're doing more than the rest of us ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Intermittent test failures: servstate.TestOnCheckFailure*
3 participants